Skip to content

feat(trial): zero-friction URL-to-workspace onboarding MVP#758

Merged
simple-agent-manager[bot] merged 35 commits intomainfrom
sam/trial-onboarding-mvp
Apr 21, 2026
Merged

feat(trial): zero-friction URL-to-workspace onboarding MVP#758
simple-agent-manager[bot] merged 35 commits intomainfrom
sam/trial-onboarding-mvp

Conversation

@simple-agent-manager
Copy link
Copy Markdown
Contributor

@simple-agent-manager simple-agent-manager Bot commented Apr 18, 2026

Summary

Implements the zero-friction URL-to-workspace onboarding MVP from idea 01KPGJQ853C44JEREXWEZS1GQ8. Anonymous visitors paste a public GitHub repo URL, watch a live discovery agent analyze it, and get pre-generated suggestion chips that lead into a full SAM workspace after a 2-click login.

Built as a single orchestrated PR via 5 waves (foundation + 4 parallel tracks + integration) against the sam/trial-onboarding-mvp integration branch. Not to be merged to main — this is flagged for @raphaeltm manual review before merge and before production configuration is applied.

cc @raphaeltm — Configuration Checklist Before Merge

Staging (sammy.party) — zero manual steps required

The deploy pipeline provisions + flips everything automatically:

  • TRIAL_CLAIM_TOKEN_SECRET — auto-generated by Pulumi (infra/resources/secrets.ts), stored encrypted in the Pulumi R2-backed state, pushed as a Worker secret by configure-secrets.sh (commit 086f4ded)
  • trials:enabled=true in KV — written by the staging deploy workflow on every run (deploy-reusable.yml + commit b15ca27c removing an invalid --remote flag)
  • TRIAL_LLM_PROVIDER=workers-ai — already wired in wrangler.toml vars
  • TRIAL_MODEL=@cf/meta/llama-3.1-8b-instruct
  • TRIAL_MONTHLY_CAP=1500
  • sam_anonymous_trials sentinel user — seeded via migration 0043

→ Nothing to click on the staging environment. A fresh workflow_dispatch on deploy-staging.yml gives you a working trial surface.

Production (simple-agent-manager.org) — one manual step (the key)

  • Procure Anthropic API key budgeted for trials
  • wrangler secret put ANTHROPIC_API_KEY_TRIAL --env production (separate from platform key)
  • Set TRIAL_LLM_PROVIDER=anthropic, TRIAL_MODEL=claude-3-5-haiku-latest, TRIAL_AGENT_TYPE=claude-code in production vars
  • Set TRIAL_MONTHLY_CAP to your preferred prod cap (default 500)
  • Flip the kill switch when ready: pnpm --filter @simple-agent-manager/api exec wrangler kv key put "trials:enabled" "true" --binding KV --env production
  • Confirm sam_anonymous_trials sentinel user exists on prod D1
  • Confirm trial_counter KV namespace + TrialCounter DO bindings exist in prod wrangler env

Cookies

  • HMAC key for trial fingerprint cookies reuses TRIAL_CLAIM_TOKEN_SECRET (auto-provisioned on staging, manual on prod if desired).

Kill Switches

  • Set KV trials:enabled=false to instantly pause trial creation. /try cleanly falls back to "Trials are paused" — verified on staging.
  • TRIAL_MONTHLY_CAP=0 is also a hard stop.

What Shipped

Wave 0 — Foundation (e253c08e)

  • Shared Valibot schemas (packages/shared/src/trial.ts) for requests, responses, SSE events, idea shape
  • D1 migration 0043: trial_projects, trial_waitlist, sam_anonymous_trials sentinel user
  • Durable Objects: TrialCounter (monthly cap), TrialEventBus (SSE fan-out)
  • HMAC-signed cookie helpers (apps/api/src/services/trial/cookies.ts) for fingerprint (7d) and claim (48h) tokens
  • Kill-switch + cap helpers, discovery prompt template, route stubs

Wave 1 Track A — Backend Lifecycle (4ca29ea6)

  • POST /api/trial/create — validates repo URL, checks kill switch + cap, creates project under sentinel user, starts discovery session
  • GET /api/trial/status — enabled + remaining slots + reset date (public, no auth)
  • POST /api/trial/waitlist — cap-exceeded email capture
  • Cron: month-rollover counter reset + 30d waitlist purge

Wave 1 Track B — Backend Claim + SSE (6ba2e101)

  • GET /api/trial/:trialId/events — SSE stream multiplexed from TrialEventBus DO
  • POST /api/trial/:trialId/claim — post-OAuth handler that transfers the anonymous project from sentinel user to the newly-signed-in user, validates claim cookie
  • OAuth callback integration (claim=<trialId> query param round-trip)
  • Agent wiring: discovery session uses TRIAL_LLM_PROVIDER + TRIAL_MODEL

Wave 1 Track C — Frontend Discovery (e8088705)

  • /try landing page (mobile-first, repo URL input, kill-switch + cap-exceeded fallbacks)
  • /try/:trialId discovery feed consuming the SSE event stream
  • /try/cap-exceeded + /try/waitlist/thanks pages
  • React Router entries wired into App.tsx

Wave 1 Track D — Frontend Chat Gate (1114c8fc)

  • ChatGate component: suggestion chip carousel + textarea + send button
  • LoginSheet modal triggering GitHub OAuth with claim cookie preserved
  • useTrialDraft hook: localStorage persistence of the draft across the OAuth round-trip
  • useTrialClaim hook: post-login auto-submit of the stashed draft to the claimed project's chat

Wave 2 — Integration, Automation, and Live Fix

  1. Merged all 4 Wave 1 tracks into sam/trial-onboarding-mvp. Two conflicts resolved:
    • apps/api/src/env.ts — kept both Track A + Track B TRIAL_* env vars.
    • apps/web/src/components/trial/ChatGate.tsx — kept Track D's real implementation; adapted Track C's TryDiscovery to Track D's TrialIdea contract + onAuthenticatedSubmit callback.
  2. Automated the staging trial secret (commit 086f4ded): added infra/resources/secrets.ts entry that auto-generates TRIAL_CLAIM_TOKEN_SECRET via @pulumi/random, and wired configure-secrets.sh to push it as a Worker secret. No manual wrangler secret put on staging ever.
  3. Automated the staging kill-switch (commits 086f4ded + b15ca27c): added a conditional step to .github/workflows/deploy-reusable.yml that writes trials:enabled=true to KV on every staging deploy (and only staging). Initial attempt used --remote, which is not a valid flag for wrangler kv key put — removed in b15ca27c.
  4. Discovered and fixed a Wave 1 integration bug (commit db1d6332): Track A was persisting new trials to D1 only, while Track B readers (events.ts, claim.ts, trial-runner.ts) look up trials in KV via readTrial(). Every SSE connection 404'd with "Trial not found". Fix mirrors the trial to KV in POST /api/trial/create after the D1 insert, before issuing cookies, with rollback on KV failure (D1 row deleted, TrialCounter slot released). writeTrial() also hardened to skip the trial-by-project: index when projectId is empty (would otherwise collide all pending trials on a single key). Added regression test asserting KV.put("trial:<id>", ...) is invoked on the happy path.

Non-negotiable Constraints Verified

  • Mobile-first (375×667 authoritative) — all four trial screens rendered and screenshot-verified at mobile width
  • Public GitHub repos only — GITHUB_REPO_URL_REGEX in shared schemas
  • Locked initial prompt — discovery prompt template owned by the backend; user cannot write the first message
  • Login gate on chat interactions — ChatGate triggers LoginSheet on any send attempt by an anonymous visitor
  • Monthly cap + kill switch — TrialCounter DO + TRIAL_ENABLED env var
  • Staging uses opencode + Workers AI; production will use claude-code + Anthropic
  • Valibot for runtime validation — every request schema in packages/shared/src/trial.ts
  • System user pattern — no schema change to projects.userId; anonymous projects owned by sam_anonymous_trials until claimed
  • HMAC-signed claim cookie — uses auto-provisioned TRIAL_CLAIM_TOKEN_SECRET

Local Quality Gates

  • pnpm typecheck — clean across all packages
  • pnpm lint — 0 errors
  • API unit tests — 3773 / 3773 passing (includes new writeTrial regression test)
  • Web unit tests — 1863 / 1863 passing

Staging Deployments

Run Commit Result
24614206706 c2780059 ✅ initial merge deploy
24614985380 pre-b15ca27c Unknown argument: remote — fixed by removing --remote flag
24615223155 post-db1d6332 ✅ final green with kill-switch KV put + all fixes

Staging Verification (Playwright + curl, live app)

TRIAL_ENABLED=true on staging, end-to-end happy path exercised:

Check Result
GET /api/trial/status {"enabled":true,"remaining":1500,"resetsAt":"2026-05-01"}
POST /api/trial/create with https://github.com/sindresorhus/is 201 with Set-Cookie: sam_trial_fingerprint=… + sam_trial_claim=…
GET /api/trial/:trialId/events via real cookies HTTP/2 200, content-type: text/event-stream, : connected heartbeat ✅
/try landing form submission on mobile 375×667 navigates to /try/:trialId, ChatGate renders "Live" status, feed waits for events, zero console errors ✅
Same on desktop 1280×800

Screenshots: trial-sse-live-mobile.png, trial-sse-live-desktop.png (in .codex/tmp/playwright-screenshots/).

Regression spot-check

  • Authenticated via smoke-test token login → /dashboard renders, project list loads, 0 console errors
  • Navigation sidebar, command palette, notifications panel all intact
  • /health200 healthy

What was NOT verified end-to-end

The OAuth claim + post-login auto-submit leg (chat gate → login sheet → GitHub OAuth → /api/trial/:trialId/claim → stashed draft replay) requires a real GitHub OAuth round-trip with a human. All individual components have unit + integration coverage; the OAuth leg is gated behind a real sign-in and deferred to Raphaël's manual review.

Review Status

Full specialist review was not dispatched because this PR is flagged for manual review by @raphaeltm before merge. The needs-human-review label is applied. Raphaël will decide whether to dispatch additional reviewers, flip production config, and proceed to merge.

Do NOT Merge Yet

  • ❌ Do NOT merge to main until Raphaël has reviewed the configuration checklist.
  • ❌ Do NOT deploy to production until the Anthropic key is procured and the OAuth claim leg has been exercised at least once.

🤖 Generated with Claude Code

raphaeltm and others added 9 commits April 18, 2026 20:17
Lays groundwork for /try — shared types (Valibot), DB migration 0043
(system user sentinel + trial_waitlist table), wrangler TRIAL_COUNTER DO
binding (v7 migration) + trial env vars, trial services (HMAC-signed
cookies with constant-time compare, KV kill-switch with 30s cache +
fail-closed, discovery prompt), 501 route stubs under /api/trial/*,
TrialCounter DO with atomic transactionSync increment/decrement, frontend
Try/TryDiscovery stubs mounted at /try + /try/:trialId, operator docs
at docs/guides/trial-configuration.md, and 43 unit tests covering
cookie round-trip/tamper/expiry, kill-switch cache/TTL/fail-closed, and
TrialCounter cap enforcement.

Trials remain disabled by default (kill-switch fails closed) so this is
safe to deploy without setting TRIAL_CLAIM_TOKEN_SECRET. Wave 1 will wire
the live create/events/claim/waitlist handlers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements backend lifecycle for zero-friction trial onboarding (Wave 1
Track A):

- trials table + sentinel-installation workaround (migration 0044)
- TrialCounter DO: fetch surface + tryIncrement/prune RPC methods
- POST /api/trial/create with Valibot validation, kill-switch gate,
  GitHub repo probe (size/privacy), DO slot allocation, and
  counter-decrement rollback on D1 failure
- GET /api/trial/status with fail-closed fallback when DO throws
- POST /api/trial/waitlist with lowercase-email dedupe via
  onConflictDoNothing(email, resetDate)
- Three scheduled modules wired into cron dispatch:
    - trial-expire: 5-min sweep marks expired trials
    - trial-rollover: monthly DO pruning (0 3 1 * *)
    - trial-waitlist-cleanup: daily notified-row purge (0 4 * * *)
- All configurable via DEFAULT_* constants + env overrides (Principle XI)
- 92 new behavioral tests covering resolution branches, DO RPC surface,
  fallback semantics, cookie issuance, and fail-closed error paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Builds the frontend components that gate the trial experience behind
GitHub auth — a chat input with suggestion chips for anonymous users,
and a login sheet that opens when they send their first message.
Integration into TryDiscovery (SSE streaming `trial.idea` events) lands
in wave-2 alongside the live /claim handler.

Components
- ChatGate: autogrowing textarea + horizontally-scrolling chip row;
  Cmd/Ctrl+Enter submits, Enter inserts newline; disabled state when
  empty/whitespace; surfaces submit errors without clearing the draft
- LoginSheet: responsive dialog (mobile bottom-sheet, desktop centered
  modal) with Escape/backdrop/close-button dismissal, focus trap
  between primary CTA + close, body scroll lock, return-to URL
  construction (trialId URL-encoded, ?claim=1 sentinel)
- SuggestionChip: 44px-tall touch target with title + optional summary,
  aria-label compose, disabled state

Hooks
- useTrialDraft: per-trialId localStorage draft with 400ms debounce
  (flush-on-unmount), synchronous writes when debounceMs=0, rehydrates
  on trialId change, no-ops with undefined trialId
- useTrialClaim: idle → claiming → submitting → done/error state
  machine; injectable claim/submit fns for testing; StrictMode-safe
  (single claim per mount); clears draft only on successful submit;
  preserves projectId when submit fails so UI can retry

Harness + tests
- TrialChatGateHarness at /__test/trial-chat-gate (public, not linked
  from nav) renders ChatGate + LoginSheet with query-param-driven mock
  data (ideas=0..20, long=1, auth=1, loginOpen=1) so Playwright can
  capture screenshots without hitting the real claim flow
- 43 new unit tests across components + hooks covering rendering,
  interactions, persistence, error states, focus management
- 13 Playwright visual scenarios at 375x667 + 1280x800: empty state,
  1/5/20 chips (page-level overflow asserted false — chip row owns
  its horizontal scroll), long-text wrapping, anonymous send opening
  LoginSheet, bottom-sheet vs centered-modal layouts, 44px touch
  targets on send button + suggestion chips

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire trial onboarding backend so the post-OAuth claim flow and the
per-trial event stream work end-to-end.

- TrialEventBus DO: in-memory ring buffer (MAX_BUFFERED_EVENTS=500)
  with long-poll /poll, /append with terminal-event auto-close, /close,
  waiter-wake semantics. Configurable via TRIAL_EVENT_BUS_DEFAULT_POLL_TIMEOUT_MS.
- trial-store service: KV-backed writeTrial/readTrial/markTrialClaimed
  with 3-key indexing (by trialId, by projectId, by fingerprint).
- trial-runner: mode-aware config resolution (staging=opencode+workers-ai,
  production=claude-code+anthropic); production requires ANTHROPIC_API_KEY_TRIAL.
  startDiscoveryAgent creates chat + ACP session with discovery prompt.
  emitTrialEvent/emitTrialEventForProject append to TrialEventBus best-effort.
- GET /api/trial/:trialId/events: fingerprint-cookie-authenticated SSE.
  Verifies trial record + HMAC signature + UUID match (fails closed on
  any mismatch). Heartbeat every TRIAL_SSE_HEARTBEAT_MS (default 15s);
  long-poll DO every TRIAL_SSE_POLL_TIMEOUT_MS; max duration
  TRIAL_SSE_MAX_DURATION_MS. Closes on terminal event.
- POST /api/trial/claim: auth-required; verifies HMAC claim cookie;
  atomic D1 UPDATE with WHERE userId=TRIAL_SENTINEL_USER_ID precondition;
  clears claim cookie; returns {projectId, claimedAt}. Returns 409 on
  UPDATE-changes=0 race.
- OAuth callback hook (maybeAttachTrialClaimCookie): on 2xx/3xx response
  from /callback/github, if a valid fingerprint cookie maps to an unclaimed
  non-expired trial, sign a claim token, set sam_trial_claim cookie, and
  rewrite Location to https://app.${BASE_DOMAIN}/try/:trialId?claim=1.
- Env + wrangler binding for TRIAL_EVENT_BUS Durable Object.

70 new unit tests (6 files) cover DO long-poll/waiter-wake/terminal-close,
SSE auth-failure matrix + happy path, claim route 400/404/409/200 branches,
oauth-hook bail-out matrix + rewrite happy path, trial-runner config
resolution + error paths, and trial-store round-trips.
Replaces Wave 0 stubs with full trial discovery flow:

- Try landing page with GitHub URL validation + error branches
  (invalid_url, repo_private, trials_disabled, cap_exceeded, existing_trial)
- TryDiscovery streams SSE events (started, progress, knowledge, idea,
  ready) with exponential backoff reconnect (max 5 retries) and renders
  repo header, progress, knowledge graph, ideas, and workspace-ready CTA
- TryCapExceeded page with waitlist email capture + inline validation
- TryWaitlistThanks confirmation page
- trial-api client: createTrial, joinWaitlist, openTrialEventStream
- ChatGate stub placeholder for Track D integration

Tests:
- Vitest component tests for Try + TryCapExceeded (11 cases: URL
  validation, success nav, existing-trial resume, each error branch,
  email validation, waitlist submit, API error)
- Playwright visual audit at 375x667 and 1280x800 covering landing,
  discovery (streaming/ready/empty), cap-exceeded, waitlist-thanks, and
  all inline error states — overflow asserted on every test

Mobile-first with design tokens; 56px primary CTA, 44px secondary
touch targets; env(safe-area-inset-*) padding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… integration

Resolves conflict in ChatGate.tsx by keeping Track D's real implementation;
adapts TryDiscovery to Track D's ChatGate contract (TrialIdea shape,
onAuthenticatedSubmit handler that navigates to the claimed project chat
with the message staged in sessionStorage).
@simple-agent-manager simple-agent-manager Bot added the needs-human-review Agent could not complete all review gates — human must approve before merge label Apr 18, 2026
… kill-switch

Previously, self-hosters had to manually run `wrangler secret put
TRIAL_CLAIM_TOKEN_SECRET` and `wrangler kv key put trials:enabled true`
before the /try flow would work on staging. Wire both into the standard
deployment pipeline so staging trials are live out of the box.

Changes:
- infra/resources/secrets.ts: add `trial-claim-token-secret` RandomId
  resource (32 bytes base64) + export `trialClaimTokenSecret` Pulumi
  output, same persistence pattern as encryptionKey / jwtPrivateKey.
- infra/index.ts: re-export the new output.
- scripts/deploy/configure-secrets.sh: read trialClaimTokenSecret from
  Pulumi state and set it as a required Worker secret on every deploy.
- .github/workflows/deploy-reusable.yml: add a staging-only step that
  sets KV `trials:enabled=true` via wrangler after the worker deploys.
  Production stays opt-in per spec (operator flips the flag manually
  when ready to accept live trial traffic).
- docs/guides/trial-configuration.md: document the automation — no more
  manual secret-put or kv-put steps for staging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`wrangler kv key put` writes to remote by default; --remote is not a
valid flag for that subcommand and caused the staging deploy's trial
kill-switch step to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…olve it

Track A (create.ts) inserted trial records into D1 only; Track B readers
(events.ts, claim.ts, trial-runner.ts) all look trials up via
trial-store.readTrial() which reads from KV. The result: every SSE
connection 404'd with "Trial not found or expired" seconds after the
trial was created.

Integration fix:
- create.ts calls writeTrial() after the D1 insert, with projectId=''
  (Track B's orchestrator rewrites the KV record once the project row
  exists). On KV failure, roll back the D1 row and release the
  TrialCounter slot so we don't burn a cap entry.
- writeTrial() skips the trial-by-project index when projectId is
  empty, preventing all pending trials from colliding on
  `trial-by-project:`.
- events.ts: use errors.notFound('Trial') — previous argument produced
  doubled "Trial not found or expired not found".

Added a regression test asserting writeTrial is invoked from the happy
path (captures the exact KV put) so this bug cannot silently recur.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simple-agent-manager
Copy link
Copy Markdown
Contributor Author

Staging verification update — trials automation + integration fix

Two follow-up commits landed after the initial PR:

1. Deploy automation (commits 086f4ded + b15ca27c)

  • TRIAL_CLAIM_TOKEN_SECRET is now auto-provisioned by Pulumi (infra/resources/secrets.ts) and pushed by configure-secrets.sh on every deploy — no manual wrangler secret put
  • trials:enabled KV flag is set automatically by deploy-reusable.yml on staging deploys — no manual wrangler kv key put
  • Production remains opt-in (operator flips the flag when ready)

2. Wave 1 integration bug fix (commit db1d6332)

  • Track A persisted trials to D1; Track B read from KV via trial-store.readTrial(). Nothing wrote KV → every SSE /events call returned 404 "Trial not found".
  • Fix: create.ts calls writeTrial() after the D1 insert with projectId='' (Track B's orchestrator rewrites the record once the project row exists). On KV failure, D1 row is rolled back and the TrialCounter slot released.
  • Hardened writeTrial() to skip the by-project index when projectId is empty, preventing pending-trial collisions.
  • Added regression test asserting writeTrial is invoked — this bug cannot silently recur.

Staging verification evidence (run 24615223155, 2026-04-18 22:22Z):

  • /api/trial/status{"enabled":true,"remaining":1498,"resetsAt":"2026-05-01"}
  • POST /api/trial/create with public repo URL → 201 with set-cookie fingerprint + claim cookies, returns trialId
  • GET /api/trial/:trialId/events with fingerprint cookie → HTTP/2 200 text/event-stream, : connected heartbeat received
  • /try/:trialId page renders ChatGate in "Live" state (green), zero console errors, on mobile 375×667 and desktop 1280×800

Updated configuration checklist for @raphaeltm:

  • TRIAL_CLAIM_TOKEN_SECRET — auto-provisioned by Pulumi, no action needed
  • Staging kill-switch — auto-set by deploy workflow, no action needed
  • Production kill-switch — flip trials:enabled=true manually when ready: pnpm --filter @simple-agent-manager/api exec wrangler kv key put "trials:enabled" "true" --binding KV --env production
  • Production Anthropic key — set ANTHROPIC_API_KEY_TRIAL via wrangler secret put ... --env production once procured (required for production trials — staging uses Workers AI, no key needed)
  • Optional tunables in apps/api/wrangler.toml: TRIAL_MONTHLY_CAP (default 1500), TRIAL_WORKSPACE_TTL_MS (default 20 min), TRIAL_DATA_RETENTION_HOURS (default 168)

Production deploy and merge remain deferred per your instructions.

simple-agent-manager Bot and others added 2 commits April 19, 2026 10:41
…760)

* task: move trial-orchestrator-wire-up to active

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(shared): add trial orchestrator timing/retry constants

Introduce DEFAULT_TRIAL_ORCHESTRATOR_* and DEFAULT_TRIAL_KNOWLEDGE_*
constants used by the alarm-driven TrialOrchestrator DO and the fast-path
GitHub knowledge probes fired from POST /api/trial/create. Every value is
env-var overridable (Constitution Principle XI).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(trial): add TrialOrchestrator DO binding, env vars, sentinel installation

- Declare TRIAL_ORCHESTRATOR DO binding + v9 migration in wrangler.toml
- Extend Env interface with TrialOrchestrator/Knowledge tuning knobs and
  TRIAL_ANONYMOUS_INSTALLATION_ID override
- Migration 0045 seeds the system_anonymous_trials_installation sentinel
  row so anonymous trial projects can satisfy the NOT NULL + FK constraint
  on projects.installation_id without owning a real GitHub App install

The DO class itself is added in the next commit.

* feat(trial): add TrialOrchestrator DO state machine

Adds the alarm-driven TrialOrchestrator Durable Object (one per trialId)
that replaces the fire-and-forget `waitUntil(provisionTrial())` pattern
with a resumable state machine.

Module layout mirrors TaskRunner:
  - types.ts     — TrialOrchestratorStep union + persisted state shape
  - helpers.ts   — re-exports TaskRunner helpers; adds sentinel-user /
                   sentinel-installation resolvers + safeEmitTrialEvent.
  - steps.ts     — per-step handlers (project_creation, node_selection,
                   node_provisioning, node_agent_ready, workspace_creation,
                   workspace_ready, discovery_agent_start, running).
  - index.ts     — DO class: start(), alarm() dispatch, backoff retry,
                   overall-timeout guard, trial.error emission on failure.

Each step emits `trial.progress` at entry so the SSE stream reflects
where the orchestrator is. Terminal `running` step is idle — the ACP
bridge (wired separately) is responsible for emitting `trial.ready`
after the discovery agent produces its first assistant turn.

All timing/retry knobs read from env vars with DEFAULT_* fallbacks
(Constitution Principle XI). Adds two new optional env fields:
TRIAL_VM_SIZE and TRIAL_VM_LOCATION for trial-specific VM overrides.

Exports the class from apps/api/src/index.ts so the Workers runtime
can instantiate it via the TRIAL_ORCHESTRATOR binding (already declared
in wrangler.toml v9 migration).

Task: tasks/active/2026-04-19-trial-orchestrator-wire-up.md

* feat(trial): bridge ACP/MCP events into trial SSE stream

Adds a dedicated `services/trial/bridge.ts` module with three helpers that
hook into existing hot paths and fan qualifying events out as `trial.*` SSE
events:

  - bridgeAcpSessionTransition: `running` → trial.ready (with workspaceUrl
    derived from BASE_DOMAIN + workspaceId), `failed` → trial.error.
  - bridgeKnowledgeAdded:       fires trial.knowledge when the discovery
    agent adds a knowledge observation via MCP.
  - bridgeIdeaCreated:          fires trial.idea with a summary-clipped
    excerpt when the discovery agent creates an idea via MCP.

All three helpers short-circuit on non-trial projects after a single
`readTrialByProject(env, projectId)` KV lookup, so normal (non-trial)
project traffic only pays that one extra KV read on qualifying events.

Hook sites:
  - ProjectData DO `transitionAcpSession` — dynamic-imports the bridge
    and dispatches after the transition succeeds, guarded by `if (projectId)`
    and wrapped in try/catch so bridge errors never block the transition.
    Casts `this.env` through unknown to the worker-scope Env because the
    DO's local Env type is intentionally narrow.
  - `handleAddKnowledge` MCP handler — dispatches after addKnowledgeObservation.
  - `handleCreateIdea`   MCP handler — dispatches after the DB insert.

Every dispatch is fire-and-forget; bridge errors are already caught
inside each helper but the call sites add a second try/catch for defense.

Task: tasks/active/2026-04-19-trial-orchestrator-wire-up.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(trial): wire TrialOrchestrator + GitHub knowledge into POST /api/trial/create

Adds two fire-and-forget dispatches after the trial record is written and
before the HTTP response returns, via c.executionCtx.waitUntil:

1. TrialOrchestrator DO `start()` — kicks off the alarm-driven state machine
   that provisions a project, workspace, and discovery agent session. The
   DO is idempotent on `start()`, so accidental re-invocations no-op.

2. emitGithubKnowledgeEvents() — hits unauthenticated GitHub REST endpoints
   (`/repos/:o/:n`, `/repos/:o/:n/languages`, `/repos/:o/:n/readme`) in
   parallel and emits up to `TRIAL_KNOWLEDGE_MAX_EVENTS` `trial.knowledge`
   events within ~`TRIAL_KNOWLEDGE_GITHUB_TIMEOUT_MS` each. Surfaces
   description, primary language, stars, topics, license, language breakdown,
   and README first paragraph so the SSE stream shows activity within ~3s
   while the VM provisions in the background.

Both helpers fully swallow errors — an orchestrator dispatch failure or
GitHub rate-limit hit never blocks the response or crashes the Worker.

All knobs are env-configurable per Constitution Principle XI:
- TRIAL_KNOWLEDGE_GITHUB_TIMEOUT_MS (default 5000)
- TRIAL_KNOWLEDGE_MAX_EVENTS (default 10)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(trial): cover orchestrator dispatch, bridge, and GitHub knowledge probe

Adds four categories of behavioral tests for the trial onboarding wiring:

1. trial-create.ts.test.ts (+2 cases)
   - Asserts TrialOrchestrator.start() is dispatched via waitUntil with
     trialId, repoOwner, repoName, and canonical repoUrl.
   - Asserts a rejecting start() does NOT propagate — the HTTP response
     still returns 201 (fire-and-forget contract).
   - Updates makeEnv() to stub TRIAL_ORCHESTRATOR + TRIAL_EVENT_BUS
     bindings and introduces makeExecutionCtx() helper.
   - Also adds a graceful-fallback in create.ts so routes that run without
     a Worker executionCtx (unit tests) still complete instead of 500-ing
     on Hono's "This context has no ExecutionContext" throw.

2. trial-github-knowledge.test.ts (new, 5 cases)
   - Happy path: verifies description, primary language, stars, topics,
     license, language breakdown, and README paragraph are all emitted.
   - TRIAL_KNOWLEDGE_MAX_EVENTS cap is enforced.
   - Total network failure → 0 events, no throw.
   - Non-2xx repo metadata response → 0 events, no throw.
   - emitTrialEvent rejection → no throw (last line of defense).

3. trial-orchestrator.test.ts (new, 4 cases)
   - start() persists initial state with currentStep='project_creation'
     and schedules an alarm.
   - start() is idempotent — second call with same input is a no-op and
     does not re-schedule the alarm.
   - alarm() on a completed state is a terminal no-op.
   - alarm() emits trial.error and marks completed when the overall
     timeout budget is exceeded.

4. trial-bridge.test.ts (new, 9 cases)
   - bridgeAcpSessionTransition: no-ops on non-trial projects, emits
     trial.ready on 'running' with ws-{id}.{BASE_DOMAIN} URL, emits
     trial.error on 'failed', no-ops on other transitions, swallows
     emitter errors.
   - bridgeKnowledgeAdded / bridgeIdeaCreated: no-op on non-trial,
     emit correct event shape when trial exists, swallow errors.

All 3,793 tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(trial): document TrialOrchestrator + GitHub knowledge fast-path

Adds an "Orchestrator and Fast-Path Knowledge" section to the trial
configuration guide covering the two fire-and-forget background tasks
dispatched from POST /api/trial/create (TrialOrchestrator DO and the
GitHub REST knowledge probe) plus the ACP/MCP event bridge, with
tunables tables for both.

Also records the change in CLAUDE.md "Recent Changes" and marks the
corresponding checklist items in the task file.

* style(trial): sort imports per eslint rules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(trial): emit trial.started event from orchestrator start()

The SSE stream's first real event must be `trial.started` so the
frontend can transition out of the "Warming up..." empty state.
Without it, viewers sat on the placeholder until `trial.progress` or
`trial.knowledge` arrived — which could be 3-5s later.

Added unit test asserting `emitTrialEvent` is called exactly once with
type='trial.started' and the expected shape.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(trial): capability test chaining start() + alarm() through event bus

Addresses task-completion-validator HIGH finding #2: no capability
test exercised the full orchestrator state machine through the event
bus seam. Existing per-method tests covered each transition in
isolation but did not chain them.

New test drives:
  start() → persist + setAlarm + emit trial.started
    → (simulate expired budget)
    → alarm() → mark failed + emit trial.error

The `emitTrialEvent` mock is the event-bus seam; its downstream is
already covered by tests/unit/routes/trial-events.test.ts which
verifies the bus → SSE stream path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(trial): archive orchestrator wire-up task

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(trial): cover alarm() retry/backoff + step handler invariants

Addresses test-engineer review HIGH findings #1 and #2 (partial).

Finding #1 — alarm() retry/backoff:
Added 4 tests driving the step-error catch branches via a `./steps`
vi.mock. Covers transient-error + retries-remaining (increments counter
and schedules backoff, no failTrial), permanent-error (immediate
failTrial regardless of budget), transient-error with retries exhausted
(promotes to failTrial), and the null-state guard (alarm fires before
start()).

Finding #2 — step handlers:
New `trial-orchestrator-steps.test.ts` covers the two highest-value
invariants that don't need D1/DO plumbing mocks:
  - handleRunning marks state.completed = true
  - handleDiscoveryAgentStart throws permanent on missing IDs
  - handleDiscoveryAgentStart is idempotent when session already linked

Broader per-handler coverage (project_creation / node_selection /
node_provisioning / node_agent_ready / workspace_creation /
workspace_ready) tracked in
tasks/backlog/2026-04-19-trial-orchestrator-step-handler-coverage.md —
those paths require mocks for drizzle + node-agent + project-data
services and are out of scope for this PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(trial): remove hardcoded BASE_DOMAIN fallback + extract heartbeat skew constant

Addresses constitution-validator findings:

HIGH — bridge.ts:41 had `env.BASE_DOMAIN || 'workspaces.example.com'` fallback.
BASE_DOMAIN is a non-optional binding; a misconfiguration that let it be empty
would silently generate workspace URLs pointing at workspaces.example.com
instead of failing loudly. Removed the fallback.

MEDIUM — steps.ts had a hardcoded `30_000` heartbeat-skew window. Extracted to
DEFAULT_TRIAL_ORCHESTRATOR_HEARTBEAT_SKEW_MS (shared), TRIAL_ORCHESTRATOR_HEARTBEAT_SKEW_MS
env override, getHeartbeatSkewMs() getter on the DO, threaded through
TrialOrchestratorContext.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(trial): per-IP rate limit on POST /api/trial/create + SSE injection guard

Addresses security-auditor HIGH findings:

1. Rate limit on POST /api/trial/create (was missing)
   - New rateLimitTrialCreate() factory (useIp=true, keyPrefix=trial-create)
   - Default 10 req/hr, configurable via RATE_LIMIT_TRIAL_CREATE env var
   - Tighter than the general anonymous bucket because each trial create
     allocates a Durable Object, fires ~4 GitHub API calls, and consumes
     a monthly trial slot
   - Mounted per-route in create.ts so the limiter sees request env
   - Regression test exercises 429 path with IP-scoped KV window

2. SSE event-name sanitization in formatSse()
   - Strips CR/LF to prevent SSE-frame injection if a future caller ever
     bypasses the TrialEvent discriminated union via `as never` casts or
     dynamic event names
   - Function now exported for direct testing
   - New trial-events-format.test.ts covers: happy path stable shape,
     CR/LF strip on hostile event name (single event frame survives),
     and JSON data escaping for embedded newlines

* fix(trial): switch TrialOrchestrator to new_sqlite_classes + drop premature status gate

Addresses cloudflare-specialist HIGH findings:

1. wrangler.toml v9 migration: new_classes -> new_sqlite_classes
   Cloudflare recommends SQLite-backed storage for new DO classes; the
   KV-style ctx.storage.put() API works identically on both backends but
   SQLite is the future-forward choice. TrialOrchestrator has not yet been
   deployed to any environment (introduced in this PR chain), so flipping
   the migration type is safe.

2. handleNodeProvisioning: remove synchronous status='running' gate
   After provisionNode() returns, async-IP providers (Scaleway, GCP) leave
   the node in 'creating' status — the IP and status='running' flip happens
   on the first heartbeat. Synchronously requiring status='running' here
   forced every async-IP trial through the retry/backoff cycle until the
   heartbeat landed, wasting retry budget and risking permanent failure on
   slow VM boots. The next step (node_agent_ready) polls heartbeat freshness
   with its own timeout, which correctly handles both sync (Hetzner) and
   async (Scaleway/GCP) provisioning paths.

Regression test: handleNodeProvisioning advances to node_agent_ready even
when provisionNode() leaves the node in 'creating' status.

* fix(trial): HMAC-verify fingerprint cookie before reusing UUID

Security-auditor HIGH: the old code extracted the fingerprint UUID from the
`sam_trial_fingerprint` cookie by splitting on the last `.` without verifying
the HMAC signature. An attacker who learned a victim's fingerprint UUID
(from logs, a captured cookie, or a prior trial row) could forge
`<victimUuid>.anything` to overwrite the `trial-by-fingerprint:<victimUuid>`
KV index to point at their own trial. The victim's subsequent OAuth hook
lookup would then redirect them to the attacker's trial project.

Fix: call verifyFingerprint(existingFp, secret) and only trust the returned
UUID. Fall back to crypto.randomUUID() on invalid / missing signature. The
secret is already resolved earlier in the same handler (line 195-203).

Added regression test in trial-create.ts.test.ts — a forged cookie MUST NOT
reuse the victim's UUID; a fresh UUID is minted instead. Updated the
"reuses existing fingerprint" test to use a validly-signed cookie.

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* task: move trial-onboarding-ux-polish to active

* feat(trial): polish discovery feed with skeleton timeline + knowledge grouping

- Extract all timing/threshold constants to trial-ui-config.ts (Constitution XI)
- Add STAGE_LABELS map + friendlyStageLabel() for orchestrator stage strings
- TryDiscovery: render StageSkeleton timeline before first SSE event arrives
- TryDiscovery: group rapid trial.knowledge events into a single card
- TryDiscovery: surface "taking longer than usual" hint when SSE silent for 20s
- TryDiscovery: retry-aware terminal error panel
- ChatGate: spinner + aria-busy on send, snap-x chip scroll, anonymous hint copy
- Try: friendlier validation copy, testid hooks for landing audit

* test(trial): cover stage-label mapping + skeleton/error/knowledge-burst Playwright cases

* task: archive trial-onboarding-ux-polish

* fix(trial): SSE replay dedup, accessible badges, larger touch targets

Addresses Phase 5 review findings on the trial onboarding UX polish PR:

CRITICAL — SSE event replay duplication
  EventSource silently re-opens after a transport error and the server may
  replay any buffered events the client missed. Without dedup, the feed
  duplicated every replayed event. Add a composite (`type:at`) dedup set
  in TryDiscovery that resets on trialId change.

HIGH — color-only ConnectionBadge (WCAG 1.4.1)
  Status was conveyed by background color alone. Prepend a Unicode shape
  indicator (●/✕/↺/○) so the meaning is also conveyed in monochrome.

HIGH — knowledge toggle hit area (WCAG 2.5.5)
  The "+N more" toggle on grouped knowledge cards was 24px tall — below
  the 44px touch-target minimum. Promote to min-h-11 with vertical hit
  padding.

MEDIUM — semantic header role + truncation hint
  The sticky discovery header used role="banner" (reserved for the
  page-wide masthead) and the truncated repo title had no full-text
  hover affordance. Switch to role="region" + aria-label and move the
  title attribute to the truncating wrapper.

LOW — error CTA touch targets
  The "Try again" / "Join the waitlist" Links were below 44px. Promote
  to inline-flex min-h-[44px].

Tests
  - try-discovery-dedup.test.ts: behavioural coverage of eventDedupKey
    and the dedup branch in onEvent (3 scenarios: identical replay,
    chronological non-collision, type-vs-timestamp collision).
  - try-discovery-build-feed.test.ts: boundary coverage of buildFeed
    (within-window merge, exact-boundary `<=` merge, +1ms split,
    interleaved non-knowledge break, error-event exclusion).
  - ChatGate.test.tsx: spinner visible/hidden behavioural test using a
    deferred promise (idle → sending → resolved transitions).
  - trial-ui-audit.spec.ts: knowledge-burst test now asserts exactly one
    grouped card (was: presence only) and exercises the expand toggle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(trial): keep StageSkeleton visible after lone trial.started; forward Alert testid

Two narrow fixes uncovered by Playwright visual audit:

1. **StageSkeleton hides too eagerly.** `showSkeleton = events.length === 0`
   meant a lone `trial.started` event (which is just an acknowledgement,
   not visible progress) caused the "Setting things up" roadmap to vanish
   while the user was still staring at a blank screen. Tighten to "no
   substantive events yet" — keep showing the roadmap until a real
   progress / knowledge / idea / ready / error event arrives.

2. **`Alert` drops `data-testid`.** The shared design-system `Alert`
   component didn't declare or forward `data-testid`, so
   `<Alert variant="error" data-testid="trial-error-panel">` silently
   discarded the prop and the terminal-error Playwright assertion
   couldn't find the panel. Add the prop to `AlertProps` and forward it
   to the rendered `<div role="alert">`.

All 45 Playwright trial-ui-audit tests now pass across iPhone SE,
iPhone 14, and Desktop projects.

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
simple-agent-manager Bot and others added 5 commits April 19, 2026 15:58
)

* task: move trial-events-debug to active

* task: instrument trial event bus path for staging triage

Add high-signal log.info points at every boundary in the trial event
flow so `wrangler tail` can show exactly where the pipeline drops:

- create.ts: log dispatch_begin, orchestrator_task.{enter,stub_ready,
  start_returned}, knowledge_task.{enter,done}, waitUntil_registered
- trial-runner.ts:emitTrialEvent — log emit_begin / emit_ok
- trial-orchestrator: start.enter, state_put, alarm_set,
  trial_started_emitted; alarm.enter
- trial-event-bus: handleAppend.enter / stored / rejected_closed

Pure instrumentation — no behavior change. Will be pared back or
removed once the failure mode is identified on staging.

* fix(trial): emit unnamed SSE frames so EventSource.onmessage fires

Root cause of the zero-events-on-staging incident (2026-04-19):
formatSse() wrote named SSE frames ('event: trial.knowledge\ndata: {...}')
but the frontend subscribes via source.onmessage, which only fires for the
default (unnamed) event. Bytes arrived on the wire — curl saw them — but no
frontend-visible event was ever dispatched.

Change the SSE serializer to emit unnamed frames ('data: {...}'). The
TrialEvent payload itself carries a 'type' discriminator so no information
is lost. Update the unit test to lock in the new contract (no 'event:' line)
and point at the post-mortem.

Also fix a latent eventsUrl contract mismatch: POST /api/trial/create
returned '/api/trial/events?trialId=X' while the real route is
'/api/trial/:trialId/events'. The frontend builds its own URL so end-users
weren't affected, but the response-field contract was wrong. The previous
unit test used toContain() on a substring, masking the drift.

See docs/notes/2026-04-19-trial-sse-named-events-postmortem.md.

* test(trial): add TrialEventBus → SSE capability test

Regression guard for the 2026-04-19 incident. Seeds a trial in KV, appends
events directly on the TrialEventBus DO (identical to emitTrialEvent()),
opens the SSE stream via SELF.fetch with a valid fingerprint cookie, reads
the raw stream bytes, and asserts:

  - HTTP 200 + correct content-type
  - At least one 'data: {...}' frame
  - No 'event:' line anywhere (the regression guard)
  - The parsed JSON payload round-trips through the bus intact

Also add TRIAL_EVENT_BUS DO binding and TRIAL_* env bindings to the workers
vitest config so this test (and future trial-related worker tests) can
construct stubs.

Note: the existing workers test pool is currently broken on this branch and
base (miniflare WebSocket exits unexpectedly on all 6 pre-existing worker
tests too — not caused by this change). Once the pool is unblocked this
test runs as-is.

* docs(trial): post-mortem + rule 13 ban curl-only SSE verification

Post-mortem covers what broke, the two-layer contract mismatch (named SSE
events + wrong eventsUrl shape), timeline, why it wasn't caught (no E2E
capability test, curl used instead of a real browser, frontend test path
not exercised), the class of bug, and the process fixes landing in this PR.

Update rule 13 (staging verification) to explicitly ban curl-only
verification for browser-consumed SSE/WebSocket streams — curl confirms the
byte stream, only a real browser confirms dispatch to onmessage.

* task: record root cause + fixes on trial SSE events task

* test(trial): update trial-events.test SSE assertion for unnamed frames

The integration test for GET /api/trial/:trialId/events was asserting the
old named-event contract ('event: trial.ready'). With the formatSse() fix
the frame is unnamed; update the assertion to lock in the new contract
(data: line present, no event: line).

* task: archive trial SSE events debugging task

* chore(trial): address review findings on SSE events fix

- Add TRIAL_ORCHESTRATOR + TRIAL_COUNTER DO bindings to
  apps/api/vitest.workers.config.ts (cloudflare-specialist MEDIUM)
- CLAUDE.md: prepend 'trial-sse-events-fix' entry to Recent Changes
  (doc-sync-validator MEDIUM)
- Fix broken link in postmortem (tasks/active -> docs/notes) and tick
  the completed rule-13 follow-up checkbox (doc-sync-validator LOW)
- Add cross-reference from .claude/rules/02-quality-gates.md to the
  rule-13 curl-only SSE-verification ban (doc-sync-validator LOW)
- File pre-existing HIGH (AbortController not propagated into
  busStub.fetch) and MEDIUM (nextCursor persistence) as backlog tasks
  so they're tracked but don't block this fix PR

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
…764)

* task: move trial orchestrator agent-boot task to active

* feat(trial): boot discovery agent on VM + detect real default branch

Two bugs blocked the trial demo from working end-to-end:

1. handleDiscoveryAgentStart only created chat + ACP session records but
   never called createAgentSessionOnNode / startAgentSessionOnNode. The
   ACP session sat in `pending` forever, never transitioning to `running`,
   so `trial.ready` never fired.
2. Project defaultBranch + workspace branch were hardcoded to 'main', so
   trials on master-default repos (e.g. octocat/Hello-World) failed the
   VM-side `git clone --branch main`.

Fix (mirrors TaskRunner's agent-session-step pattern):

- Add `defaultBranch`, `mcpToken`, `agentSessionCreatedOnVm`,
  `agentStartedOnVm`, `acpAssignedOnVm`, `acpRunningOnVm` fields to
  TrialOrchestratorState for crash-safe idempotency.
- `fetchDefaultBranch()` probes GitHub's public API with a 5s
  AbortController timeout (TRIAL_GITHUB_TIMEOUT_MS override), falls
  back to 'main' on any failure. Threaded through both
  `projects.default_branch` and the workspace-side `git clone --branch`.
- `handleDiscoveryAgentStart` now runs a 5-step idempotent flow:
    1. startDiscoveryAgent (existing) -> chat + ACP session records.
    2. createAgentSessionOnNode (new) -> D1 agent_sessions row + VM
       agent registers the session.
    3. generateMcpToken + storeMcpToken (new) -> KV token so the agent
       can call add_knowledge / create_idea.
    4. startAgentSessionOnNode (new) -> VM agent boots the agent
       subprocess with the discovery prompt + MCP server URL.
    5. transitionAcpSession pending -> assigned -> running -> the trial
       bridge emits `trial.ready` with workspaceUrl.
- Trial's synthetic taskId = state.trialId (trials have no tasks row),
  so MCP rate-limiting keys per-trial. Drop get_instructions from the
  initial prompt since it'd 404 against the tasks table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(trial): capability coverage for orchestrator VM agent boot

Adds trial-orchestrator-agent-boot.test.ts asserting the 3-step VM boot
pattern + ACP pending→assigned→running transitions + idempotency across
crash/retry. Updates trial-orchestrator-steps.test.ts for the new nodeId
requirement and adds mocks for node-agent/mcp-token/project-data services.

Also adds fetchDefaultBranch coverage (master, 404 fallback, network error
fallback, idempotent re-entry).

Post-mortem at docs/notes/2026-04-19-trial-orchestrator-agent-boot-postmortem.md.
Process fix: adds port-of-pattern coverage bullet to
.claude/rules/10-e2e-verification.md so a port of TaskRunner's agent-session
pattern into a new consumer must assert every step fired.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* task: archive trial orchestrator agent-boot task

* docs(trial): add CLAUDE.md Recent Changes + TRIAL_GITHUB_TIMEOUT_MS row

* fix(trial): persist defaultBranch before D1 insert + redact mcpToken in getStatus

Cloudflare-specialist review (HIGH): two fixes
1. handleProjectCreation now persists state.defaultBranch before the D1
   projects insert. Previously a crash between the D1 write and the DO
   state persist could cause a retry to re-probe GitHub and resolve a
   different branch than what had already landed in the projects row.
2. getStatus() now redacts the live mcpToken bearer credential before
   returning state to any debug/admin caller. The stale comment claiming
   the DO doesn't store secrets is corrected.

* fix(trial): revoke MCP token on failure + redaction test + review doc sync

Addresses Phase 5 reviewer findings from the trial-agent-boot PR:

security-auditor HIGH:
- Revoke state.mcpToken in failTrial() before emitting trial.error. Mirrors
  TaskRunner's state-machine.ts:265-275 pattern; closes the 4-hour TTL
  window where a leaked/botched-trial bearer token stays usable.
- Document the intentional non-revocation in handleRunning() — orchestrator
  terminates but the discovery agent still needs the token for MCP calls
  during the 20-min workspace TTL.
- Document the sentinel userId scoping limitation on resolveAnonymousUserId
  so future trial code remembers that per-user queries do NOT isolate
  trials from each other; projectId/trialId scoping is mandatory.

task-completion-validator MEDIUM:
- New test coverage for getStatus() mcpToken redaction (both populated and
  uninitialized state branches).
- New test coverage for failTrial revocation (happy path + KV-error tolerance).

doc-sync-validator HIGH:
- Add Trial Onboarding section to .claude/skills/env-reference/SKILL.md
  cross-referencing docs/guides/trial-configuration.md for the full table.

* fix(trial): allow multiple trials per repo (partial unique index)

The `(user_id, installation_id, repository)` unique index on `projects`
prevented more than one anonymous trial per public repo — every trial
after the first on the same repo hit a UNIQUE constraint failure during
the projects insert in TrialOrchestrator.handleProjectCreation. The DO
retried 6 times on alarm backoff then emitted a terminal `trial.error`
("step_failed"), so the user saw the 10% progress event repeat and then
fail.

Why it slipped through earlier reviews: the capability tests mock D1, so
no test exercised the real constraint. Staging verification only tested
a single trial per repo. This surfaced the moment a second trial on
`octocat/Hello-World` landed during Phase 6 verification.

Fix:
- Migration 0046 drops + recreates the index as a partial unique index
  that excludes the trial-sentinel user `system_anonymous_trials`. Real
  users still can't register duplicate project rows; sentinel-owned
  trial rows are isolated by `projectId` (per helpers.ts sentinel scope
  note).
- Drizzle schema updated with matching `.where()` clause so codegen and
  migration stay in sync.

Verified locally: trial-orchestrator tests pass (28/28); typecheck clean;
lint clean (no new warnings).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
trial.ready is a provisioning milestone (workspace is up), not a signal
that discovery is complete. The discovery agent continues producing
trial.knowledge and trial.idea events after the workspace is provisioned.

Changes:
- Event bus: only auto-close on trial.error, not trial.ready
- Frontend: keep EventSource open after trial.ready with a 3-minute
  grace timer (TRIAL_DISCOVERY_STREAM_TIMEOUT_MS) for late-arriving
  discovery events
- Header shows "Discovering <repo>…" while stream is still open
  after trial.ready, then "Ready: <repo>" after stream closes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…icons

- Add TrialAgentActivityEvent type and bridgeAgentActivity() to pipe
  agent messages/tool calls into the trial SSE stream
- Hook message persistence path to emit trial.agent_activity events
- Render agent activity cards in the feed (grouped, showing tool names)
- Replace misleading "Workspace ready — chat below" with informative
  message about agent analyzing repository
- Replace emoji icons (📎, ★) with lucide-react icons (BookOpen, Lightbulb,
  Brain, Wrench, Terminal) matching platform design
- Add auto-scroll to bottom on new events (scrollIntoView smooth)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Deduplicate consecutive progress events with the same stage in the
  feed — the orchestrator re-emits keepalive progress while waiting
  for the agent, creating visual spam (3x "Starting the agent" at 70%)
- Clean up agent activity text: strip XML tags, collapse JSON blobs,
  add line-clamp-2 for overflow
- Change "AGENT WORKING..." from uppercase to normal case
- Add cleanActivityText() helper for readable tool output summaries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
raphaeltm and others added 2 commits April 21, 2026 01:17
Merge sam/trial-discovery-stream-fix into trial MVP branch, bringing:
- Auto-scroll to bottom on new events
- Agent activity cards grouped in feed with Lucide icons
- Progress card deduplication and text cleanup
- Stream stays open after trial.ready (agent continues producing events)
- Default model switched to Qwen 3 30B

Update trial-event-bus test to match new behavior: trial.ready no
longer closes the bus since the discovery agent continues producing
knowledge and idea events after workspace provisioning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add AI usage section to the admin analytics dashboard, powered by
the AI Gateway Logs API. Shows token usage, estimated cost, trial
vs. authenticated breakdown, per-model metrics, and daily trends.

Backend:
- New admin endpoint GET /api/admin/analytics/ai-usage?period=7d
  queries AI Gateway logs with pagination and aggregates by model/day
- AI proxy now tags requests with projectId and trialId in
  cf-aig-metadata for trial usage attribution
- Configurable via AI_USAGE_PAGE_SIZE, AI_USAGE_MAX_PAGES env vars

Frontend:
- AIUsageChart component with KPI cards, stacked bar chart (tokens
  by model), daily usage area chart, and model breakdown table
- Integrated into admin analytics dashboard above DAU chart
- Graceful fallback if AI Gateway is not configured (catch + null)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…stics

The CF AI Gateway Logs API uses `order_by_direction` (not `direction`) for
sort order, and error responses now include the upstream body for easier
debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Cloudflare AI Gateway Logs API enforces a maximum per_page of 50.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
simple-agent-manager Bot and others added 5 commits April 21, 2026 04:25
* fix(trial): address review findings from trial onboarding subagents

Security and correctness fixes from 7 specialist reviewers:

CRITICAL:
- Fix cookie domain mismatch: claim.ts clearClaimCookie and oauth-hook.ts
  buildClaimCookie now pass domain from BASE_DOMAIN (matching create.ts)

HIGH:
- TrialEventBus DO: persist `closed` flag to storage so it survives eviction
- AI proxy: sanitize error bodies — log raw errors server-side, return generic
  messages to clients (prevents internal URL/config leakage)
- Admin AI usage: sanitize CF API error responses the same way
- SSE events endpoint: add per-IP rate limiting (30 req/5min via KV)
- Deploy pipeline: forward ANTHROPIC_API_KEY_TRIAL as optional Worker secret
- sync-wrangler-config: inject ENVIRONMENT var into generated env sections
- Remove hardcoded DEFAULT_GATEWAY_ID; require AI_GATEWAY_ID from env

MEDIUM:
- Cron collision: move trial counter rollover from 03:00 to 05:00 UTC
  (avoids collision with daily analytics forward job at 03:00)
- Replace magic number in create.ts with DEFAULT_TRIAL_CLAIM_TTL_MS constant
- Add trial secrets to secrets-taxonomy.md and trial-configuration.md
- Add comprehensive trial + AI proxy env vars to .env.example
- Fix test mocks: add ctx.storage to TrialEventBus tests, add KV to SSE tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(trial): address CTO review — 6 quality improvements

1. Reject unknown IP: SSE rate limit now returns 400 when no client IP
   header is present, instead of sharing a single "unknown" bucket across
   all headerless clients. CF-Connecting-IP is always present on Workers.

2. Document KV rate limit trade-off: added inline comment explaining why
   KV's non-atomic read-modify-write is acceptable here (storm prevention,
   not exact enforcement) vs DO-based counters for credential rotation.

3. Clean up formatSse: removed unused _eventName parameter that gave the
   false impression the event name was being used. Updated all call sites
   and tests.

4. Cookie domain consistency test: new regression test suite asserting
   that buildClaimCookie, clearClaimCookie, and buildFingerprintCookie
   produce matching Domain= attributes. Explicitly demonstrates the bug
   where clearing without a domain fails to delete a domain-scoped cookie.

5. AI_GATEWAY_ID self-hoster safe: returns an empty summary (zero counts)
   when AI_GATEWAY_ID is not configured, instead of throwing. Self-hosters
   who don't use AI Gateway get a clean "no data" admin dashboard.

6. Fix .env.example cron default: TRIAL_CRON_ROLLOVER_CRON now shows
   "0 5 1 * *" matching the actual default after the collision fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Resolves package.json version conflict (take main's newer deps) and
fixes simple-import-sort/exports error in packages/shared/src/constants/index.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Autofix export sort in apps/web/src/lib/api/index.ts
- Move useMemo before early return in AIUsageChart (rules-of-hooks)
- Prefix unused anthropicModels with _ in staging test
- Add FILE SIZE EXCEPTION comments for TryDiscovery.tsx and steps.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@simple-agent-manager simple-agent-manager Bot merged commit 1f92ecf into main Apr 21, 2026
16 of 19 checks passed
simple-agent-manager Bot added a commit that referenced this pull request Apr 21, 2026
Covers the trial onboarding MVP (PR #758), AI proxy Anthropic routing,
Codex scope validation backfire (PR #772), and the seven-reviewer
cleanup (PR #770).

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human-review Agent could not complete all review gates — human must approve before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant